structural equation model
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Saarland (0.04)
- North America > United States > California (0.04)
- Asia > Japan (0.04)
- Asia > Middle East > Jordan (0.05)
- North America > United States > North Carolina (0.04)
- North America > United States > Michigan (0.04)
- (3 more...)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Research Report > Strength High (0.68)
BCDNets: ScalableVariationalApproachesfor BayesianCausalDiscovery
Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Causal Interpretation of Self-Attention in Pre-Trained Transformers
We propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The structural equation model can be interpreted, in turn, as a causal structure over the input symbols under the specific context of the input sequence. Importantly, this interpretation remains valid in the presence of latent confounders. Following this interpretation, we estimate conditional independence relations between input symbols by calculating partial correlations between their corresponding representations in the deepest attention layer. This enables learning the causal structure over an input sequence using existing constraint-based algorithms. In this sense, existing pre-trained Transformers can be utilized for zero-shot causal-discovery. We demonstrate this method by providing causal explanations for the outcomes of Transformers in two tasks: sentiment classification (NLP) and recommendation.
Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach
Structural equation models (SEMs) are widely used in sciences, ranging from economics to psychology, to uncover causal relationships underlying a complex system under consideration and estimate structural parameters of interest. We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation. We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using the stochastic gradient descent. We consider both 2-layer and multi-layer NNs with ReLU activation functions and prove global convergence in an overparametrized regime, where the number of neurons is diverging. The results are established using techniques from online learning and local linearization of NNs, and improve in several aspects the current state-of-the-art. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
- North America > United States > California > Los Angeles County > Los Angeles (0.29)
- North America > United States > Oregon > Benton County > Corvallis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (4 more...)